ohi logo
OHI-Northeast | OHI Science | Citation policy

knitr::opts_chunk$set(fig.width = 10, fig.height = 6, fig.path = 'figs/', message = FALSE, warning = FALSE)

dir_git <- '~/github/ne-prep'
source(file.path(dir_git, 'src/R/common.R'))  ### an OHI-NE specific version of common.R

library(tidyverse)
#install.packages("striprtf")
library(striprtf)
library(zoo)
library(trelliscopejs)

dir_anx <- file.path(dir_M, 'git-annex/neprep')

Summary

The jobs data layer is used as an input for the Liveilhoods subgoal. We use data from the National Ocean Economics Program (NOEP). NOEP provides data on the number of jobs that directly or indirectly depend upon the ocean.

We calculated annual job growth rates by comparing total employment to the average number of jobs over the previous 3 years. The target for region job growth is set to be equal to or greater than the national average, calculated using data from the Bureau of Labor Statistics.


Data source

National Ocean Economics Program (NOEP)

Downloaded: Manually downloaded by state from website on July 3, 2018.
Description: Total number of jobs and wages per sector for RI, ME, MA, CT, NY and NH counties from 2005 to 2015. The data also include number of establishments and GDP for each sector - state - year.
Native data resolution: County level
Time range: 2005 - 2015
Format: Tabular

NOTES

The data was initially cleaned in the clean_noep_data.R script. Further cleaning specifically related to the Employment data takes place in this script.


Data cleaning

Read in the cleaned NOEP data.

noep_data = read.csv("data/clean_noep_data.csv")

Since we are only interested in the employment data I remove the other columns from the NOEP data and filter for “All Ocean Sectors” since this goal is not sector specific.

coast_jobs <- noep_data %>%
  select(-X, -Wages, -Establishments, -GDP) %>%
  filter(Sector == "All Ocean Sectors")

Visualize data

ggplot(coast_jobs, aes(x = Year, y = Employment, color = County)) +
  geom_line() +
  theme_bw() +
  facet_trelliscope(~rgn_name, scales = "free", self_contained = T, width = 800)

Meta-analysis

To identify some inconsistencies I see in the data, I’m going to take a look at the reported employment values at both the county level and statewide. One would expect that the sum of the county employment values would equal what is reported for the “Statewide” employment values. It seems that this is not the case.

states <-  c("Maine", "New Hampshire", "Rhode Island", "Massachusetts", "Connecticut", "New York")

meta <- function(state){
  
  all <- coast_jobs %>%
    filter(State == !!state,
           str_detect(County, "All")) %>%
    select(Year, Employment) %>%
    distinct() %>%
    rename(all_ctys_employment = Employment)
  
  out <- coast_jobs %>%
    filter(State == !!state,
           str_detect(County, "All") == FALSE) %>%
    select(State, County, Year, Employment) %>%
    distinct() %>%
    group_by(Year) %>%
    summarize(totals = sum(Employment, na.rm = T)) %>%
    left_join(all) %>%
    rename(county_totals = totals,
           statewide = all_ctys_employment) %>%
    gather(key = spatial_res, value = Employment, -Year) %>%
    mutate(State  = !!state)
  
  return(out)
}

t <- map_df(states, meta) %>%
  distinct()

ggplot(t, aes(x = Year, y = Employment, color = spatial_res)) +
  geom_line() +
  theme_bw() +
  facet_wrap(~State, scales = "free") +
  scale_color_manual(" ", labels = c("County", "State"), values = c("blue", "red")) 

There are some clear discrepancies in the dataset between the total number of jobs reported at the state level (red lines) and the sum of all employment numbers at the County level (blue lines). Massachusetts shows near-parallel trends in both county and statewide jobs so I am comfortable using the county level data. Since Massachusetts is split into two regions for this assessment, we will need to keep the county resolution of this data.

New Hampshire has significantly more statewide jobs beginning in 2011 due to data supression up until that point. I asked the folks at NOEP about NH specifically:

“Jamie, These data suppressions can occur at different geographic levels. Industry data that are suppressed at the county level can show up at the state or national level. For N.H. some Marine Transportation industry data are being revealed at the state level after 2010, but not for the counties. This is not uncommon with the BLS QCEW data where the disclosure rules are more strict.”

We can assume that if data were not suppressed pre-2011 statewide, we would also see parallel trends in New Hampshire as well. Operating under that assumption, we can use the county level data to calculate job growth rate trends. Rhode Island, Connecticut and Maine show low employment numbers in earlier years when adding up at the county level. This could be due to a lack of data. For example, Saghadoc county in Maine has no data up until 2010, when the jump happens. This suggests we should use the statewide data for Maine. There is no missing data in Connecticut or Rhode Island. This might suggest data suppression in those earlier years at the county level. Therefore the statewide data should be used.


Assign spatial scale to use

Using the information gained from the meta analysis above, I’m assigning which spatial scale to use for each of the six states. For Maine, Connecticut, New York and Rhode Island we are going to use the State level data, which means we filter for rows that say “All [state] counties” and remove the rest. For Massachusetts and New Hampshire we want to keep only the individual county information and then summarize total number of jobs from that data. Finally we join the two datasets into jobs_data_clean.

#select the data for ME, CT, NY, and RI, which is going to use the data reported for "All x counties"
state_data <- coast_jobs %>%
  filter(str_detect(County, "All"),
         State %in% c("Maine", "Connecticut", "Rhode Island", "New York")) %>%
  select(state = State, year = Year, rgn_id, rgn_name, rgn_employment = Employment)
  
#select the data for MA and NH
county_data <- coast_jobs %>%
  filter(str_detect(County, "All")== FALSE,
         State %in% c("New Hampshire", "Massachusetts")) %>%
  group_by(rgn_id, Year) %>%
  mutate(rgn_employment = sum(Employment, na.rm = T)) %>% #employment by region
  select(state = State, year = Year, rgn_id, rgn_name, rgn_employment) %>%
  distinct()

jobs_data_clean <- bind_rows(state_data, county_data)

ggplot(jobs_data_clean, aes(x = year, y = rgn_employment, color = rgn_name)) +
  geom_line() +
  theme_bw() +
  labs(x = "Year",
       y = "Number of jobs",
       color = "Region")


Calculate job growth

We want to calculate job growth rate for each year. To do this, we take the annual employment and divide it by the average employment of the previous 3 years. Since our dataset begins in 2005, we can not get growth rates for the years 2005-2007.

I also save an intermediate file, coastal_jobs_data.csv, which shows the actual employment numbers for each year and the average of the previous 3 years.

jobs_cst_ref <- jobs_data_clean %>%
  arrange(year) %>%
  group_by(rgn_id) %>%
  mutate(coast_jobs_3yr = rollapply(rgn_employment, 3, FUN = mean, align = "right", na.rm=F, partial = T), #calc the three year mean
         coast_jobs_prev_3yr = lag(coast_jobs_3yr, n = 1), #create a new column that aligns the avg employment from previous three years with the year with which we're comparing.
         coast_job_growth = ifelse(year %in% c(2005:2007), NA, (rgn_employment/coast_jobs_prev_3yr)-1)) %>%  #assign NA to the first three years in the time series because there is not enough data to calculate this rate. 2007 growth rate *should* be compared to average of 2004-2006. But we don't have that data
  write_csv("int/coastal_jobs_data.csv")

Let’s see how the data looks.

c <- jobs_cst_ref %>%
  select(rgn_id, year, rgn_employment, rgn_name, coast_jobs_prev_3yr) %>%
  gather(cat, jobs, -rgn_id, -year, -rgn_name)

ggplot(c, aes(x = year, y = jobs, color = cat)) +
  geom_line() +
  facet_wrap(~rgn_name, scales = "free") +
  ylab("Number of jobs") +
  xlab("Year") +
  theme_bw() +
  ggtitle("Regional employment in coastal jobs") +
  scale_color_manual(" ", labels = c("Mean employment over the \nprevious 3 years","Annual employment"), values = c("red", "blue"))


Setting targets

Using Bureau of Labor Statistics data, we can get the nationwide average job growth and use this as our target for coastal job growth.

I’m using the blscrapeR package and have my own API access key.

#devtools::install_github("keberwein/blscrapeR")
library(blscrapeR)
#read in API access key that is saved on Mazu
blsKey <- read_rtf(file.path(dir_M,'git-annex/neprep/keys/BureauofLaborStatistics.rtf'))
set_bls_key(blsKey, overwrite=TRUE)

After some researching on the BLS website, I found that the table I am interested in is identified by “ENUUS00010010”.

us_employment <- bls_api("ENUUS00010010",
              startyear = 2001, endyear = 2016, registrationKey = "BLS_KEY", annualaverage=TRUE) %>%
  as.data.frame() %>%
  filter(periodName == "Annual") %>%
  select(year, us_jobs = value) %>%
  arrange(year) %>%
  mutate(us_jobs_3yr = rollapply(us_jobs, 3, FUN=mean, align = "right", na.rm=F, partial = T),
         us_jobs_prev_3yr = lag(us_jobs_3yr, n = 1),
         us_job_growth = (us_jobs/us_jobs_prev_3yr)-1) %>%
  select(year, us_job_growth)

ggplot(us_employment, aes(x = year, y = us_job_growth)) +
  geom_line() +
  labs(x = "Year",
       y = "Job growth rate",
       title = "National job growth rate:\ncomparing annual employment to average employment of the previous 3 years") +
  theme_bw()

Combine US job growth dataset with the coastal job dataset to make comparisons between job growth rates.

jobs <- jobs_cst_ref %>%
  left_join(us_employment) %>%
  select(year, rgn_id, rgn_name, coast_job_growth, us_job_growth)

write.csv(jobs, file.path(dir_calc, "layers/le_job_growth.csv"))
jobs_plot <- jobs %>%
  gather("series", "rate", na.rm = T, -year, -rgn_id, -rgn_name)

ggplot(jobs_plot, aes(x = year, y = rate, color = series)) +
  geom_hline(yintercept = 0, color = "black") +
  geom_line() +
  facet_wrap(~rgn_name) +
  theme_bw() +
  labs(y = "Job Growth Rate",
       x = "Year",
       color = "") +
  scale_color_manual(" ", labels = c("Regional","National"), values = c("red", "blue"))

We see that regional growth rate is often above the national average with some exceptions in Massachusetts-Virginian and New Hampshire. There is a dip in all data series around the years 2008-2010, coinciding with the recession of the time.


References

National Ocean Economics Program. Ocean Economic Data by Sector & Industry., ONLINE. 2012. Available: http://www.OceanEconomics.org/Market/oceanEcon.asp [3 July 2018]

Bureau of Labor Statistics, U.S. Department of Labor, Quarterly Census of Employment and Wages. 7/24/2016. http://www.bls.gov/cew/](http://www.bls.gov/cew/).